Visualizing SMPS Data using the py-smps library

The Scanning Mobility Particle Sizer (SMPS) is a high resolution particle sizer that is commonly used in research for characterizing the size distribution of aerosols.

This py-smps python library is a simple way to read in the data, analyze it, and visualize it. A loader (smps.io.load_file) can be used to import the data from the SMPS, and two plotting functions are available (smps.plots.heatmap, smps.plots.histplot).

Below is a quick tutorial to show how to import the data, look at it, and plot it. Any bugs with the software can be reported on github.

Requirements

I personally recommend using python3 and heavily leaning on seaborn for visualization help. There are three required packages for this library:

Data

To make the process seamless, I recommend exporting your data from the SMPS with the settings in column format with a ',' delimiter. For units, using dN/dlogDp is preferred, as it is the natural format for aerosol distributions. I have made available an ambient dataset which is available here.

Visualization

The beautification of plots is aided by using seaborn. For more information, check out their documentation! It's great.

Import the Library


In [1]:
import smps
import seaborn as sns
import os
import matplotlib
import matplotlib.pyplot as plt
import json

%matplotlib inline

# You can use seaborn to easily control how your plots appear
sns.set('notebook', style='ticks', font_scale=1.5, palette='dark')

smps.set()

print ("smps v{}".format(smps.__version__))
print ("seaborn v{}".format(sns.__version__))
print ("matplotlib v{}".format(matplotlib.__version__))


smps v1.0.0
seaborn v0.9.0
matplotlib v3.0.0

Load the Data into an SMPS object

The SMPS loader (smps.io.load_file) returns an SMPS object which has several attributes including:

  • SMPS.raw
  • SMPS.df
  • SMPS.meta
  • SMPS.bins
  • SMPS.midpoints
  • SMPS.bin_labels
  • SMPS.histogram

smps.io.load_file(fpath, column=True, **kwargs)

Arguments

  • fpath: File Path for the data
  • column: If your data is in 'column' format, set True. Otherwise, set False

In [2]:
bos = smps.io.load_sample("boston")

Explore the SMPS Object

Let's take a look at the SMPS object that was returned by the loader.

SMPS.meta

The SMPS.meta attribute contains the meta information that was held in the SMPS text file. It is returned as a python dictionary.


In [3]:
print (json.dumps(bos.meta, indent=4))


{
    "Sample File": "C:\\Users\\Marduk\\Documents\\SMPS_data\\r20161122_SMPS.6.7.S80",
    "Classifier Model": "3080",
    "DMA Model": "3081",
    "DMA Inner Radius(cm)": "0.00937",
    "DMA Outer Radius(cm)": "0.01961",
    "DMA Characteristic Length(cm)": "0.44369",
    "CPC Model": "3775 Low Flow",
    "Gas Viscosity (kg/(m*s))": "1.822e-005",
    "Mean Free Path (m)": "6.642e-008",
    "Channels/Decade": "64",
    "Multiple Charge Correction": "FALSE",
    "Nanoparticle Aggregate Mobility Analysis": "FALSE",
    "Diffusion Correction": "FALSE",
    "Units": "dw/dlogDp",
    "Weight": "Number",
    "Lower Size (nm)": 21.2875,
    "Upper Size (nm)": 1000.0,
    "weight": "number",
    "units": "dw/dlogdp"
}

SMPS.bins and SMPS.midpoints

SMPS.bins is an nx3 array that contains the left, middle, and right side of each bin in the dataset. SMPS.midpoints is simply the center column of bins. NOTE: All diameters are expected to be in nm. This can be changed by altering the dp_units argument. All diameters are then promptly converted to microns.


In [4]:
# print out the first 4 bins
bos.bins[0:4]


Out[4]:
array([[0.0212875, 0.0217   , 0.0220673],
       [0.0220673, 0.0225   , 0.0228757],
       [0.0228757, 0.0233   , 0.0237137],
       [0.0237137, 0.0241   , 0.0245824]])

In [5]:
# print out the midpoints
bos.midpoints


Out[5]:
array([0.0217, 0.0225, 0.0233, 0.0241, 0.025 , 0.0259, 0.0269, 0.0279,
       0.0289, 0.03  , 0.0311, 0.0322, 0.0334, 0.0346, 0.0359, 0.0372,
       0.0385, 0.04  , 0.0414, 0.0429, 0.0445, 0.0461, 0.0478, 0.0496,
       0.0514, 0.0533, 0.0552, 0.0573, 0.0594, 0.0615, 0.0638, 0.0661,
       0.0685, 0.071 , 0.0737, 0.0764, 0.0791, 0.082 , 0.0851, 0.0882,
       0.0914, 0.0947, 0.0982, 0.1018, 0.1055, 0.1094, 0.1134, 0.1176,
       0.1219, 0.1263, 0.131 , 0.1358, 0.1407, 0.1459, 0.1512, 0.1568,
       0.1625, 0.1685, 0.1747, 0.1811, 0.1877, 0.1946, 0.2017, 0.2091,
       0.2167, 0.2247, 0.2329, 0.2414, 0.2503, 0.2595, 0.269 , 0.2788,
       0.289 , 0.2996, 0.3106, 0.322 , 0.3338, 0.346 , 0.3587, 0.3718,
       0.3854, 0.3995, 0.4142, 0.4294, 0.4451, 0.4614, 0.4783, 0.4958,
       0.514 , 0.5328, 0.5523, 0.5725, 0.5935, 0.6153, 0.6378, 0.6612,
       0.6854, 0.7105, 0.7365, 0.7635, 0.7915, 0.8205, 0.8505, 0.8817,
       0.914 , 0.9475, 0.9822])

SMPS.histogram and SMPS.raw

SMPS.histogram contains the histogram as a pandas DataFrame. The index is a timeseries and can easily be manipulated. SMPS.raw contains both the histogram and all aditional information that the SMPS records including means, modes, etc. It also is a pandas DataFrame.


In [6]:
# Display the first few rows of the DataFrame
bos.data.head(3)


Out[6]:
Sample # bin0 bin1 bin2 bin3 bin4 bin5 bin6 bin7 bin8 ... Status Flag td(s) tf(s) D50(nm) Median(nm) Mean(nm) Geo. Mean(nm) Mode(nm) Geo. Std. Dev. Total Conc.(#/cm³)
timestamp
2016-11-22 15:20:48 1 938.332 1581.720 1219.210 1795.380 1216.890 1670.140 908.874 1653.720 1204.76 ... Normal Scan 2.93 12.4094 1000 40.5183 66.2331 50.1490 24.1442 1.97913 697.18
2016-11-22 15:23:20 2 374.100 234.678 254.937 422.669 372.819 541.616 657.469 897.744 1084.25 ... Normal Scan 2.93 12.4094 1000 61.8646 70.9645 64.1694 61.5265 1.52780 5865.59
2016-11-22 15:25:50 3 5552.500 3805.570 3505.620 4527.480 4050.570 3430.330 3077.630 3346.820 2465.44 ... Normal Scan 2.93 12.4094 1000 41.4080 61.8417 48.9025 21.6739 1.90589 1913.93

3 rows × 132 columns

SMPS.stats

SMPS.stats contains the statistics generated by the SMPS. You can weight by number, surface area, volume, or mass and the results include the total number of particles, total surface area, total volume, total mass, the arithmetic mean (AM), the geometric mean (GM), the mode, and the geometric standard deviation (GSD).

In addition, you can integrate or calculate the stats over just a small section of the distribution by leveraging the dmin and dmax arguments.


In [7]:
bos.stats(weight='number').head()


Out[7]:
number surface_area volume mass AM GM Mode GSD
timestamp
2016-11-22 15:20:48 697.179400 19.788759 1.036779 1.710686 66.232849 50.150192 24.1 1.980009
2016-11-22 15:23:20 5865.582059 122.225646 3.126004 5.157906 70.962424 64.168127 61.5 1.527811
2016-11-22 15:25:50 1913.922994 38.724580 1.094102 1.805268 61.841832 48.904347 21.7 1.906091
2016-11-22 15:28:20 1128.932490 19.119169 0.561025 0.925691 55.358279 43.893424 21.7 1.877475
2016-11-22 15:30:49 1118.602001 26.788436 0.814267 1.343541 69.367619 55.631108 21.7 1.916492

In [8]:
bos.scan_stats.head()


Out[8]:
Status Flag High Voltage Scan Up Time(s) Retrace Time(s) Median(nm) Mode(nm) CPC Inlet Flow(lpm) Total Conc.(#/cm³) Sample # Low Voltage ... Upper Size(nm) Impactor Type(cm) Aerosol Flow(lpm) Down Scan First Density(g/cc) Scans Per Sample D50(nm) td(s) tf(s) Lower Size(nm)
timestamp
2016-11-22 15:20:48 Normal Scan 9735 120 30 40.5183 24.1442 0.3 697.18 1 10.6283 ... 1000 None 0.3 False 1 1 1000 2.93 12.4094 21.2875
2016-11-22 15:23:20 Normal Scan 9735 120 30 61.8646 61.5265 0.3 5865.59 2 10.6283 ... 1000 None 0.3 False 1 1 1000 2.93 12.4094 21.2875
2016-11-22 15:25:50 Normal Scan 9735 120 30 41.4080 21.6739 0.3 1913.93 3 10.6283 ... 1000 None 0.3 False 1 1 1000 2.93 12.4094 21.2875
2016-11-22 15:28:20 Normal Scan 9735 120 30 35.0935 21.6739 0.3 1128.94 4 10.6283 ... 1000 None 0.3 False 1 1 1000 2.93 12.4094 21.2875
2016-11-22 15:30:49 Normal Scan 9735 120 30 54.8718 21.6739 0.3 1118.60 5 10.6283 ... 1000 None 0.3 False 1 1 1000 2.93 12.4094 21.2875

5 rows × 25 columns

We can go ahead and resample the data by mean if we would like to! Under the hood, this method splits the raw dataframe into numeric and non-numeric columns before resampling by mean the numeric columns and the non-numerics by 'first'. If inplace=True, then it will save the resampled data and replace the current raw dataframe. Otherwise, it will return a copy of the object.


In [9]:
bos.resample("5min", inplace=True)

bos.data.head(3)


Out[9]:
Sample # bin0 bin1 bin2 bin3 bin4 bin5 bin6 bin7 bin8 ... tf(s) D50(nm) Median(nm) Mean(nm) Geo. Mean(nm) Mode(nm) Geo. Std. Dev. Total Conc.(#/cm³) Impactor Type(cm) Status Flag
timestamp
2016-11-22 15:20:00 1.5 656.2160 908.1990 737.0735 1109.0245 794.8545 1105.8780 783.1715 1275.732 1144.5050 ... 12.4094 1000.0 51.19145 68.59880 57.15920 42.83535 1.753465 3281.385 None Normal Scan
2016-11-22 15:25:00 3.5 5025.4750 3542.6750 3495.1700 3674.2750 3541.0800 3384.4750 3200.3750 2985.765 2390.5950 ... 12.4094 1000.0 38.25075 58.59965 46.39665 21.67390 1.891495 1521.435 None Normal Scan
2016-11-22 15:30:00 5.5 1323.2865 1281.7545 1218.1965 1145.7940 1089.3235 1052.0015 846.4660 846.964 816.1545 ... 12.4094 1000.0 55.81775 75.35480 59.27410 33.09090 1.924750 746.581 None Normal Scan

3 rows × 132 columns

Visualization

Okay. All we really want to do is visualize our data, right? Two common plots are a heatmap-like plot (smps.plots.heatmap) and a particle size distribution (smps.plots.histplot).

Here, we show how to use both of them. Each one returns a matplotlib axis object which can easily be manipulated as you would any other matplotlib object. This makes it easy to alter how they look, add lables, etc.

smps.plots.heatmap(X, Y, Z, ax=None, kind='log', cbar=True, cmap=default_cmap, fig_kws=None, cbar_kws=None, **kwargs)

Okay, so all you really need to do to plot the heatmap is give it your X, Y, and Z data:

  • X: Time Axis
  • Y: Bin midpoints
  • Z: Data (usually in the format of $dN/dlogD_p$)

You may think the default colormap is not ideal (it probably isn't), so you can easily change it by feeding it any valid matplotlib colormap object. You can read more about those here or here.


In [10]:
X = bos.dndlogdp.index
Y = bos.midpoints
Z = bos.dndlogdp.T.values

ax = smps.plots.heatmap(X, Y, Z, cmap='viridis', fig_kws=dict(figsize=(14, 6)))

# make the x axis dates look presentable
import matplotlib.dates as dates

ax.xaxis.set_minor_locator(dates.HourLocator(byhour=[0, 6, 12, 18]))
ax.xaxis.set_major_formatter(dates.DateFormatter("%d\n%b\n%Y"))

# Go ahead and change things!
ax.set_title("Cambridge, MA Wintertime SMPS Data", y=1.02, fontsize=20);


smps.plots.histplot(histogram, bins, ax=None, plot_kws=None, fig_kws=None, **kwargs)

To plot a histogram, you need to provide two pieces of information:

  • histogram: Your histogram data! You can provide it as an array, or as a DataFrame (it will be averaged out)
  • bins: Bin midpoints

There are plenty of ways to customize these plots. You can provide additional keyword arguments for the matplotlib bar chart (plot_kws) or the figure itself (fig_kws). You can also plot on an existing axis by providing that argument.

Example 1

Let's make a basic histogram depicting the particle size distribution over the entire dataset.


In [11]:
ax = smps.plots.histplot(bos.dndlogdp, bos.bins, plot_kws={'linewidth': .01}, fig_kws=dict(figsize=(12,6)))

ax.set_title("Cambridge, MA Wintertime Size Distribution")
ax.set_ylabel("$dN/dlogD_p \; [cm^{-3}]$")

sns.despine()


Example 2

Let's plot two seperate days and make them slightly transparent. Let's also go ahead and get rid of the linewidth on the individual bars.


In [12]:
dates = ["2016-11-23", "2016-11-24", "2016-11-25"]

ax = None

for i, date in enumerate(dates):
    color = sns.color_palette()[i]
    plot_kws = dict(alpha=0.65, color=color, linewidth=0.)
    
    ax = smps.plots.histplot(bos.dndlogdp[date], bos.bins, ax=ax, plot_kws=plot_kws, fig_kws=dict(figsize=(12, 6)))
    
# Add us a legend!
ax.legend(dates, loc='best')

ax.set_ylabel("$dN/dlogD_p \; [cm^{-3}]$")

# Remove the spines
sns.despine()



In [ ]: